In The Claims: 



1. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

an electronic device that captures audio /video data corresponding to a 
photographic target, said audio /video data including a narration 
concurrently provided by a narrator specifically to mark identify 
where respective subject matter locations are positioned in said 
audio /video data; 

a speech recognition engine that automatically performs a speech 

recognition process upon said narration to generate labels that 
correspond to said respective subject matter locations in said 
audio/ video data, said labels being text conversions of utterances in 
said narration, said labels each being specifically aligned with 
corresponding ones of said respective subject matter locations within 
said audio /video data; and 

a label manager that manages a label mode for generating and storing said 
labels, said label manager also controlling a label search mode for 
that utilizes utilizing said labels to automatically locate said 
respective subject matter locations in said audio/video data. 

2. (Original) The system of claim 1 wherein said electronic device is implemented 
as an audio/ video camcorder device. 
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3. (Original) The system of claim 1 wherein said speech recognition engine is 
configured in a simplified configuration that efficiently compares said narration with 
acoustic models to identify phone strings that represent said narration, said speech 
recognition engine referencing a compact dictionary to look up recognized vocabulary 
words that correspond to said phone strings, said speech recognition engine utilizing 
a limited set of recognition grammar to form said recognized vocabulary words into 
said labels that are supported by said speech recognition engine. 

4. (Original) The system of claim 1 wherein said label manager initially instructs 
said electronic device to enter a real-time label mode for creating and storing said 
labels, said electronic device concurrently capturing said audio/ video data and said 
narration after said label manager instructs said electronic device to enter said real- 
time label mode. 

5. (Original) The system of claim 1 wherein said electronic device enters a real- 
time label mode in response to a verbal label-mode command from a system user, 
said verbal label-mode command being recognized and provided to said label 
manager by said speech recognition engine. 

6. (Original) The system of claim 1 wherein said speech recognition engine 
automatically generates said labels as said electronic device captures said 
audio /video data and said narration. 

7. (Original) The system of claim 1 wherein a post processor performs a post- 
processing procedure upon said labels in a real-time label mode, said post- 
processing procedure including a validation procedure using one or more confidence 
measures to eliminate invalid labels that fail to satisfy pre-determined validation 
criteria. 
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8. (Original) The system of claim 1 wherein said label manager stores said 
labels during a real-time label mode, said labels being stored along with meta- 
information that associates each of said respective subject matter locations to a 
corresponding one of said labels. 

9. (Original) The system of claim 1 wherein said electronic device initially 
captures said audio /video data and said narration prior to entering said label 
mode. 

10. (Original) The system of claim 1 wherein said label manager instructs said 
electronic device to enter a non-real-time label mode for creating and storing said 
labels, said electronic device responsively retrieving and playing back said 
audio /video data and said narration. 

11. (Original) The system of claim 1 wherein said speech recognition engine 
automatically generates said labels by analyzing said audio/ video data and said 
narration as said electronic device plays back said audio /video data and said 
narration. 

12. (Original) The system of claim 1 wherein a post processor performs a post- 
processing procedure upon said labels in a non-real-time label mode, said post- 
processing procedure including a validation procedure using one or more confidence 
measures to eliminate invalid labels that fail to satisfy pre-determined validation 
criteria. 

13. (Original) The system of claim 1 wherein said label manager coordinates a 
label validation procedure for validating said labels, said label manager generating a 
validation graphical user interface upon a display of said electronic device for a 
system user to interactively evaluate, delete, and edit said labels. 
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14. (Original) The system of claim 1 wherein said label manager coordinates a 
label validation procedure for validating said labels in response to verbal validation 
commands from a system user, said verbal validation commands being recognized 
and provided to said label manager by said speech recognition engine. 

15. (Original) The system of claim 1 wherein said label manager stores said labels 
in a non-real-time label mode, said labels being stored along with me ta- information 
that associates each of said respective subject matter locations- to a corresponding 
one of said labels. 

16. (Original) The system of claim 1 wherein said label manager instructs said 
electronic device to enter said label search mode during which a system user 
interactively selects a search label for performing a label search procedure to locate a 
specific one of said respective subject matter locations corresponding to said search 
label. 

17. (Original) The system of claim 1 wherein said label manager generates a label- 
search GUI on a display of said electronic device, a system user viewing said labels 
and corresponding representative images from said audio/ video data for selecting a 
search label. 

18. (Original) The system of claim 1 wherein a system user selects a search label 
by issuing a verbal search-label command, said verbal search-label command being 
recognized and provided to said label manager by said speech recognition engine. 

19. (Original) The system of claim 1 wherein said label manager instructs said 
electronic device to automatically locate and retrieve a specific one of said respective 
subject matter locations in response to a system user selecting a search label. 
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20. (Original) The system of claim 1 wherein said electronic device automatically 
plays back a specific retrieved one of said respective subject matter locations from 
said audio/ video data for viewing by said system user. 

21. (Currently Amended) A method for cataloguing electronic information, 
comprising: 

capturing audio /video data corresponding to a photographic target by 
utilizing an electronic device, said audio/ video data including a 
narration concurrently provided by a narrator specifically to mark 
identify where respective subject matter locations are positioned in 
said audio/ video data; 

providing a speech recognition engine that automatically performs a speech 
recognition process upon said narration to generate text labels that 
correspond to said respective subject matter locations in said 
audio/ video data, said text labels being text conversions of 
utterances in said narration, said labels each being specifically 
aligned with corresponding ones of said respective subject matter 
locations within said audio/ video data; 

managing a label mode for generating and storing said text labels by 
utilizing a label manager; and 

controlling a label search mode with said label manager, said label search 
mode utilizing said text labels to automatically locate said respective 
subject matter locations in said audio/ video data. 

22. (Original) The method of claim 21 wherein said electronic device is 
implemented as an audio /video camcorder device. 
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23. (Original) The method of claim 21 wherein said speech recognition engine is 
configured in a simplified configuration that efficiently compares said narration with 
acoustic models to identify phone strings that represent said narration, said speech 
recognition engine referencing a compact dictionary to look up recognized vocabulary 
words that correspond to said phone strings, said speech recognition engine utilizing 
a limited set of recognition grammar to form said recognized vocabulary words into 
said text labels that are supported by said speech recognition engine. 

24. (Original) The method of claim 21 wherein said label manager initially 
instructs said electronic device to enter a real-time label mode for creating and 
storing said text labels, said electronic device concurrently capturing said 
audio/ video data and said narration after said label manager instructs said 
electronic device to enter said real-time label mode. 

25. (Original) The method of claim 21 wherein said electronic device enters a real- 
time label mode in response to a verbal label-mode command from a system user, 
said verbal label-mode command being recognized and provided to said label 
manager by said speech recognition engine. 

26. (Original) The method of claim 21 wherein said speech recognition engine 
automatically generates said text labels as said electronic device captures said 
audio /video data and said narration. 

27. (Original) The method of claim 21 wherein a post processor performs a post- 
processing procedure upon said text labels in a real-time label mode, said post- 
processing procedure including a validation procedure using one or more confidence 
measures to eliminate invalid text labels that fail to satisfy pre-determined validation 
criteria. 
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28. (Original) The method of claim 21 wherein said label manager stores said 
text labels during a real-time label mode, said text labels being stored along with 
meta-information that associates each of said respective subject matter locations 
to a corresponding one of said text labels. 

29. (Original) The method of claim 21 wherein said electronic device initially 
captures said audio/video data and said narration prior to entering said label 
mode. 

30. (Previously Presented) The method of claim 21 wherein said label manager 
instructs said electronic device to enter a non-real-time label mode for creating and 
storing said text labels, said electronic device responsively retrieving and playing 
back said audio /video data and said narration. 

3 1 . (Original) The method of claim 2 1 wherein said speech recognition engine 
automatically generates said text labels by analyzing said audio/ video data and said 
narration as said electronic device plays back said audio /video data and said 
narration. 

32. (Original) The method of claim 21 wherein a post processor performs a post- 
processing procedure upon said text labels in a non-real-time label mode, said post- 
processing procedure including a validation procedure using one or more confidence 
measures to eliminate invalid text labels that fail to satisfy pre-determined validation 
criteria. 

33. (Original) The method of claim 21 wherein said label manager coordinates a 
label validation procedure for validating said text labels, said label manager 
generating a validation graphical user interface upon a display of said electronic 
device for a system user to interactively evaluate, delete, and edit said text labels. 
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34. (Original) The method of claim 21 wherein said label manager coordinates a 
label validation procedure for validating said text labels in response to verbal 
validation commands from a system user, said verbal validation commands being 
recognized and provided to said label manager by said speech recognition engine. 

35. (Original) The method of claim 21 wherein said label manager stores said text 
labels in a non-real-time label mode, said text labels being stored along with meta- 
information that associates each of said respective subject matter locations to a 
corresponding one of said text labels. 

36. (Original) The method of claim 21 wherein said label manager instructs said 
electronic device to enter said label search mode during which a system user 
interactively selects a search label for performing a label search procedure to locate a 
specific one of said respective subject matter locations corresponding to said search 
label. 

37. (Original) The method of claim 21 wherein said label manager generates a 
label- search GUI on a display of said electronic device, a system user viewing said 
text labels and corresponding representative images from said audio/ video data for 
selecting a search label. 

38. (Original) The method of claim 21 wherein a system user selects a search label 
by issuing a verbal search-label command, said verbal search-label command being 
recognized and provided to said label manager by said speech recognition engine. 

39. (Original) The method of claim 21 wherein said label manager instructs said 
electronic device to automatically locate and retrieve a specific one of said respective 
subject matter locations in response to a system user selecting a search label. 
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40. (Original) The method of claim 21 wherein said electronic device automatically 
plays back a specific retrieved one of said respective subject matter locations from 
said audio /video data for viewing by said system user. 

41. (Currently Amended) A computer-readable medium comprising program 
instructions for cataloguing electronic information by: 

capturing audio/video data corresponding to a photographic target by 
utilizing an electronic device, said audio /video data including a 
narration concurrently provided by a narrator specifically to mark 
identify where respective subject matter locations are positioned in 
said audio /video data; 

providing a speech recognition engine that automatically performs a speech 
recognition process upon said narration to generate text labels that . 
correspond to said respective subject matter locations in said 
audio/ video data, said text labels being text conversions of 
utterances in said narration, said labels each being specifically 
aligned with corresponding ones of said respective subject matter 
locations within said audio/ video data; 

managing a label mode for generating and storing said text labels by 
utilizing a label manager; and 

controlling a label search mode with said label manager, said label search 
mode utilizing said text labels to automatically locate said respective 
subject matter locations in said audio/ video data. 
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42. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

means for capturing audio/ video data corresponding to a photographic 
target, said audio /video data including a narration concurrently 
provided by a narrator specifically to mark identify where respective 
subject matter locations are positioned in said audio/ video data; 

means for automatically performing a speech recognition process upon 
said narration to generate text labels that correspond to said 
respective subject matter locations in said audio/ video data, said 
text labels being text conversions of utterances in said narration, 
said labels each being specifically aligned with corresponding ones of 
said respective subject matter locations within said audio/ video 
data; 

means for managing a label mode to generate and store said text labels; 
and 

means for controlling a label search mode that utilizes said text labels to 
automatically locate said respective subject matter locations in said 
audio/video data. 
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43. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

an imaging device that captures audio/ video data corresponding to 

selected photographic targets, said audio/ video data including a 
verbal narration concurrently provided by a narrator specifically to 
mark identify where respective subject matter locations are 
positioned in said audio/ video data; 

a speech recognition engine that automatically performs a speech 

recognition process upon said narration to generate text labels that 
are based upon said narration, said text labels corresponding to said 
respective subject matter locations in said audio/ video data, said 
text labels being text conversions of utterances in said narration, 
said labels each being specifically aligned with corresponding ones of 
said respective subject matter locations within said audio/ video 
data, said text labels including abbreviated word sequences that 
identify said selected photographic targets; and 

a label manager that manages a label mode during which said text labels 
are generated by said speech recognition engine, said label manager 
also storing said text labels during said label mode, said text labels 
being stored along with meta-information that associates said 
respective subject matter locations to corresponding ones of said text 
labels, said label manager also controlling a label search mode for 
utilizing said text labels to automatically locate specific 
corresponding ones of said respective subject matter locations from 
said audio/ video data, said label manager providing a label-search 
user interface upon a display of said imaging device for displaying 
said text labels and corresponding visual images of said respective 
subject matter locations from said audio/ video data, a system user 
interactively choosing a selected text label by utilizing said label- 
search user interface, said imaging device responsively displaying 
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said audio /video data from a selected subject matter location 
corresponding only to said selected text label. 

44. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

an electronic device that captures said electronic information that includes 
verbal narration data concurrently provided specifically to mark 
identify where respective subject matter locations are positioned in 
said audio /video data; 

a speech recognition engine that analyzes said electronic information to 
generate labels that correspond to said respective subject matter 
locations in said electronic information, said labels being text 
conversions of utterances in said verbal narration data, said labels 
each being specifically aligned with corresponding ones of said 
respective subject matter locations within said audio/ video data; 
and 

a label manager that utilizes said labels to automatically locate said 

respective subject matter locations in said electronic information. 



13 



45. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

an electronic device that captures audio/ video data corresponding to a 
photographic target, said audio /video data including a narration 
concurrently provided by a narrator to specifically mark identify 
where respective subject matter locations are positioned in said 
audio /video data; and 

a speech recognition engine that automatically performs a speech 

recognition process upon said audio /video data to generate labels 
that correspond to said respective subject matter locations in said 
audio/ video data, said labels being text conversions of utterances in 
said narration, said labels each being specifically aligned with 
corresponding ones of said respective subject matter locations within 
said audio /video data. 

46. (Currently Amended) A system for cataloguing electronic information, 
comprising: 

an electronic device that captures audio/ video data corresponding to a 
photographic target, said audio /video data including a narration 
concurrently provided by a narrator specifically to mark identify 
where respective subject matter locations are positioned in said 
audio/video data; and 

a label manager that controls a label search mode for utilizing labels 

derived from said narration to automatically locate corresponding 
ones of said respective subject matter locations in said audio/ video 
data, said labels each being specifically aligned with corresponding 
ones of said respective subject matter locations within said 
audio /video data. 
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47. (Currently Amended) An electronic cataloguing system implemented by: 
capturing electronic data which includes a narration concurrently provided 

by a narrator specifically to mark identify- where respective subject 
matter locations are positioned in said audio /video data; 

performing a speech recognition process upon said electronic data to 
automatically generate labels that correspond to said respective 
subject matter locations in said electronic data, said labels being text 
conversions of utterances in said narration, said labels each being 
specifically aligned with corresponding ones of said respective 
subject matter locations within said audio/ video data; and 

utilizing said labels to automatically locate said respective subject matter 
locations in said electronic data. 

48. (Previously Presented) The system of claim 8 wherein said meta- 
information includes video timecode information. 

49. (Currently Amended) The system of claim 12 wherein said confidence 
measures include a label amplitude parameter and a label duration parameter, 
said label amplitude parameter being based upon a narration amplitude, said 
label duration parameter being based upon a duration of said narration. 

50. (Previously Presented) The system of claim 17 wherein said representative 
images are implemented as thumbnail images. 

51. (Previously Presented) The system of claim 1 wherein said electronic device 
is a single discrete video camcorder that hosts said speech recognition engine, 
said label manager, said labels, and said audio/ video data. 



15 



52. (New) The system of claim 1 wherein said narration is recorded by a head- 
mounted sound-sensor device that is worn in close proximity to said narrator, 
said narration being identified for conversion into said labels by having a greater 
amplitude than other ambient sound that is recorded from more remote sources 
as part of said audio/ video data. 
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